Energy Dynamics Laboratory - Visual Analytics of Epidemic Spread
Provide a short description of
the tool(s) you used. Mention where and when it was developed.
Additional credit to developers of the tools can be provided
here, and links to find more information on the tool.
If the tool used is a toolkit, rate the effort needed to customize the toolkit for this specific analysis. Consider such things as programming ability required, amount of time needed.
(250 words MAX)
The tools used for this analysis are :
Rapidminer
Gnuplot
GIMP
Python Scripting Language
Native Linux shell utilities like grep, sort, uniq, cut, awk, etc.
These are all either open source or freely available. The
analysis was done entirely on a Linux PC running Ubuntu 10.04.2
Video:
ANSWERS:
MC 1.1 Origin and Epidemic
Spread: Identify approximately where the outbreak started on the map
(ground zero location). If possible, outline the affected area.
Explain how you arrived at your conclusion.
First, the file Microblogs.csv was filtered to extract messages containing symptoms. Daily histogram of these messages is shown in Figure 1.
From this figure it is clear that the outbreak
started on Day 138 (May 18, 2011). The
ground zero was identified by looking at the evolution of messages by
the hour on May 18th. There was a spike in messages starting at 8 AM,
indicating the outbreak. Figure 2. shows the location of these
messages.
Figure 2: Location of messages on May 18, 2011 between 8 and 9 AM
Figure 2 shows the ground zero to be downtown near the Vastopolis Dome, Vastopolis City Hospital, and the Convention Center.
The most affected areas are Downtown and the areas of Smogtown, Westside and Painville along the banks of the river, as marked in Figure 2 (elaborated in section MC 1.2 and also in the video).
We hypothesize that the method of transmission isi person-to-person
and waterborne but not airborne. This is based on the following
observations.
The messages were separated by the hour intervals for each day of
the epidemic, and their spatial locations plotted on the map. Figure 3
shows such a plot for May 18th between 8 AM and 9 AM. Compare this to
Figure 4 which shows the locations of messages sent between 10 AM and
11 AM on the same day. The Vastopolis map has been reduced in contrast
for this visualization so that both the red markers for the calls as
well as the map features are recognizable.
Figure 3: Spatial location of messages between 8 AM and 9 AM on May 18, 2011
Figure 4: Spatial location of messages between 10 AM and 11 AM on May 18, 2011
Comparison of Figure 3 and 4 shows that
the epidemic has spread from Downtown area to all directions, such as
Northville, Cornertown, Lakeside, Suburbia etc. It is obvious from
visual comparison that the spread in the Eastward direction is
more pronounced than in the Westward direction. This is also supported
by numerical results of statistical analysis done on the Eastern and
Western halves of the map. This shows that the human-to-human
transmission is one of the ways the epidemic is spreading. This also
shows that the epidemic is not spreading through airborne means since
the weather data shows that the wind direction is Westward on May 18th.
Figures 5, 6, and 7 below show the spatial distribution of all messages
sent on May 18th, 19th, and 20th respectively. These also confirm that
the epidemic is not spreading through airborne means because if it were
so, we would have expected a much higher density of cases in the
Western half on May 19th (since the wind direction was Westbound on May
18th) and in the NNW direction from ground zero on May 20th since the
wind was blowing in the NNW direction on May 19th. But Figures 6 and 7
do not show any such pattern.
However, Figures 5, 6, and 7 do
indicate increasing incidences along the banks of the river South of
ground zero latitude, in Westside, Smogtown and Painville. This
supports the hypothesis that the epidemic is spreading through water
borne means, and is consistent with the fact that the rive flows South.
The fact that this increasing density of cases along the banks of the
river is not observed North of ground zero latitude also supports this
hypothesis.
Figure 5: Spatial distribution of all messages for May 18, 2001
Figure 6: Spatial distribution of all messages for May 19, 2001
Figure 7: Spatial distribution of all messages for May 20, 2001